When Creative Labs found a new design methodology built around Celoxica 's Handel-C -- a high-level,C-like design language -- it found a powerful design environment that could deliver working hardware in record time. For developers at Creative Labs Inc.(Milpitas,Calif.), Handel-C seemed to be an approach worth exploring.Since nothing shows the true nature of a design environment and methodology like real results, and despite having no experience in hardware design or in Handel-C, a team of two software engineers took on the task of building a hardware audio processor to test the language.
The initial phase of the project involved a two-day return to the books, in this case to technical documentation describing Handel-C and its associated environment (a development version of Celoxica 's DK1 design suite). The pair discovered that it enabled esigners to describe hardware at a higher level of
abstraction than possible with traditional RTL design languages such as Verilog or VHDL (see “Designing hardware algorithmically,” page 37).
After learning to use Handel-C,the designers mapped out the basic functional blocks needed to meet their key objectives: real-time performance and
high-resolution processing of audio streams. Audio processing is suited to pipeline architectures and presents clear opportunities for parallelism in hardware execution. On this project,the software engineers partitioned the design
into a pipeline that included the following elements:
- Digital audio extraction and PCM sample formatting. This stage extracts
audio data from an Atapi CD-ROM drive and de-interleaves the stereo samples
to prepare for encoding.
- Filter stage. Here,the data is transformed from the time domain to the frequency domain and sorted into 32 subbands.
- Encoding stage. This is the data-reduction and encoding stage, which results in a high compression ratio of processed data.
- Output stage. This stage takes the encoded data and interleaves it with headers to maintain framing, and then outputs the final digital bit stream.
For this design, the team used multiple tables to support the processing pipeline. The ease of partitioning with Handel-C meant they could experiment with various approaches for partitioning tables between on-chip RAM and ROM and off-chip RAM. To support the considerable multiplication tasks, the team allocated two multipliers in parallel to provide increased performance. The final design used a four-stage parallel pipeline architecture to permit a high degree of resource sharing for elements such as multipliers, lookup tables and shared RAM banks.
The team began the project with an existing 10,000-line software implementation of an audio processor. Focusing on the core code set,the designers began converting it into hardware through an incremental refinement process familiar to software engineers. Working on a module at a time, the team began detailed coding. Handel-C conversions typically begin by converting floating-point variables and calculations to integer-based code. This conversion can be approached methodically and can be easily tested along the way. In this case, the team changed integer lengths, then tested the result in hardware, listening to changes in audio quality by recompiling the code and reconfiguring the hardware implementation.
Through that process of recompiling and reconfiguring
the designers quickly assessed their understanding of
Handel-C by writing quick tests of basic functions, such
as simple code blocks hat implemented direct-memory
access data transfers. In fact, Handel-C lets designers
verify function simply by analyzing the actual hardware
implementation, in marked contrast with the more complex verification requirements for VLSI design methods.
The Creative team employed a simple but effective
development strategy based on the existing audio processor software algorithms. As the design proceeded
module by module, the design team simply substituted
the actual hardware for the software pipeline stages. The existing software version was used as a testbed for
verifying the function of the hardware pipeline. The
hardware pipeline stages could be fed by the same inputs used to test the software version,and the incrementally completed hardware pipeline could be fed into
the remainder of the software pipeline to test overall
function. For this approach, the design team simply
plugged the results of the partial hardware pipeline into
the software version to verify the results.
At this point,the development effor became largely a
software development process. The primary hardware development challenge was in recognizing where resources
could be shared effectively and finding the most effective
pipelining approach to achieve high clock rates -- which
takes more logic than nonpipelined designs.
Each week,the developers progressed further into
the pipeline,converting each module of the software
version,line by line,into Handel-C. Much of their effort
centered on a processing requirement for raising each
value to a fractional power. Because it is expensive to
implement exponential calculation in hardware, the developers used a large (128-kbyte)set of lookup tables, which achieved a piecewise-linear approximation of the
quantization function that was sufficiently accurate for
the application.
As they converted each module, the developers either simulated the new Handel-C programs or converted
them directly into hardware and tested them in the hardware environment. After they completed the conversions, they developed and ested I/O modules and
integrated the entire audio processor design on a single
FPGA. The total elapsed time was six weeks for a basic
working prototype,including the initial learning period.
Because of the ease of this approach, the team decided to take an additional week to optimize the hardware design,exploring different schemes for sharing tables or partitioning them between on-chip and off-chip
memory areas. As a result, the designers realized they
could collapse several tables into a lesser number that
could be shared by different stages of the pipeline. They
also added more fine-grained parallelism to inner loops
to increase the amount of work done per clock cycle.
Quick results
After seven weeks the two Creative Labs engineers were
able to design an audio processor that performed better
than real-time using only a 6-MHz clock rate, which they
said hey could improve to further boost this performance. The resulting design took 93 percent of the
FPGA, but could have consumed less by using block
RAMs, an option that was not available in the development version of the DK1 compiler used.
Compared to the software implementation, the hardware audio processor 's greatest gains were in performance per clock.The hardware managed better-than-real-time performance with a 6-MHz clock, compared
with real-time performance of he software version at
about 166 MHz on a Pentium. This performance came
from the parallelism that could
be extracted from the algorithms. For example, many of
the DSP-type operations in the
design could achieve multiple
multiplies per clock.
Celoxica design-services consultants provided advice on
space and delay optimizations
that helped the Creative Labs
team decide where and how to
share resources and pipeline. A
design-services engineer from
Celoxica optimized the filter
stage,which led to a significant
improvement (about 200 percent) in the speed of that stage.
For Creative Labs,Handel-C and its associated development environment performed well as a powerful new
approach for creating high-performance hardware
against tight schedules.By adopting a methodology that
leverages a C-like,high-level language, software engineers can design,build and optimize hardware -- and
see the results in fully functioning hardware prototypes -- in a fraction of the time needed through conventional methods. For today's electronics organizations, Handel-C offers a rapid development approach that's critically needed to provide alternatives to traditional, complex IC design approaches.